{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CommaSeparatedListOutputParser\n",
    "\n",
    "- Author: [Junseong Kim](https://www.linkedin.com/in/%EC%A4%80%EC%84%B1-%EA%B9%80-591b351b2/)\n",
    "- Peer Review : [Teddy Lee](https://github.com/teddylee777), [stsr1284](https://github.com/stsr1284), [brian604](https://github.com/brian604)\n",
    "- Proofread : [BokyungisaGod](https://github.com/BokyungisaGod)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/03-OutputParser/02-CommaSeparatedListOutputParser.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/03-OutputParser/02-CommaSeparatedListOutputParser.ipynb)\n",
    "## Overview\n",
    "\n",
    "The ```CommaSeparatedListOutputParser``` is a specialized output parser in LangChain designed for generating structured outputs in the form of comma-separated lists.\n",
    "\n",
    "It simplifies the process of extracting and presenting data in a clear and concise list format, making it particularly useful for organizing information such as data points, names, items, or other structured values. By leveraging this parser, users can enhance data clarity, ensure consistent formatting, and improve workflow efficiency, especially in applications where structured outputs are essential.\n",
    "\n",
    "This tutorial demonstrates how to use the ```CommaSeparatedListOutputParser``` to:\n",
    "\n",
    "  1. Set up and initialize the parser for generating comma-separated lists\n",
    "  2. Integrate it with a prompt template and language model\n",
    "  3. Process structured outputs iteratively using streaming mechanisms\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environment Setup](#environment-setup)\n",
    "- [Implementing the CommaSeparatedListOutputParser](#implementing-the-commaseparatedlistoutputparser)\n",
    "- [Using Streamed Outputs](#using-streamed-outputs)\n",
    "\n",
    "### References\n",
    "\n",
    "- [LangChain ChatOpenAI API reference](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)\n",
    "- [LangChain Core Output Parsers](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#)\n",
    "- [Python List Tutorial](https://docs.python.org/3.13/tutorial/datastructures.html)\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "%pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\n",
    "        \"langsmith\",\n",
    "        \"langchain\",\n",
    "        \"langchain_openai\",\n",
    "        \"langchain_community\",\n",
    "    ],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "# Set environment variables\n",
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        \"OPENAI_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"02-CommaSeparatedListOutputParser\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can alternatively set ```OPENAI_API_KEY``` in ```.env``` file and load it. \n",
    "\n",
    "[Note] This is not necessary if you've already set ```OPENAI_API_KEY``` in previous steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Implementing the ```CommaSeparatedListOutputParser```\n",
    "If you need to generate outputs in the form of a comma-separated list, the ```CommaSeparatedListOutputParser``` from LangChain simplifies the process. Below is a step-by-step implementation:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Importing Required Modules\n",
    "Start by importing the necessary modules and initializing the ```CommaSeparatedListOutputParser```. Retrieve the formatting instructions from the parser to guide the output structure.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`\n"
     ]
    }
   ],
   "source": [
    "from langchain_core.output_parsers import CommaSeparatedListOutputParser\n",
    "\n",
    "# Initialize the output parser\n",
    "output_parser = CommaSeparatedListOutputParser()\n",
    "\n",
    "# Retrieve format instructions for the output parser\n",
    "format_instructions = output_parser.get_format_instructions()\n",
    "print(format_instructions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Creating the Prompt Template\n",
    "Define a ```PromptTemplate``` that dynamically generates a list of items. The placeholder subject will be replaced with the desired topic during execution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input_variables=['subject'] input_types={} partial_variables={'format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'} template='List five {subject}.\\n{format_instructions}'\n"
     ]
    }
   ],
   "source": [
    "from langchain_core.prompts import PromptTemplate\n",
    "\n",
    "# Define the prompt template\n",
    "prompt = PromptTemplate(\n",
    "    template=\"List five {subject}.\\n{format_instructions}\",\n",
    "    input_variables=[\"subject\"],  # 'subject' will be dynamically replaced\n",
    "    partial_variables={\n",
    "        \"format_instructions\": format_instructions\n",
    "    },  # Use parser's format instructions\n",
    ")\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Integrating with ```ChatOpenAI``` and Running the Chain\n",
    "Combine the ```PromptTemplate```, ```ChatOpenAI``` model, and ```CommaSeparatedListOutputParser``` into a chain. Finally, run the chain with a specific ```subject``` to produce results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Gyeongbokgung Palace', 'N Seoul Tower', 'Bukchon Hanok Village', 'Seongsan Ilchulbong Peak', 'Haeundae Beach']\n"
     ]
    }
   ],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "# Initialize the ChatOpenAI model\n",
    "model = ChatOpenAI(temperature=0)\n",
    "\n",
    "# Combine the prompt, model, and output parser into a chain\n",
    "chain = prompt | model | output_parser\n",
    "\n",
    "# Run the chain with a specific subject\n",
    "result = chain.invoke({\"subject\": \"famous landmarks in South Korea\"})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Accessing Data with Python Indexing\n",
    "Since the ```CommaSeparatedListOutputParser``` automatically formats the output as a Python list, you can easily access individual elements using indexing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First Landmark: Gyeongbokgung Palace\n",
      "Second Landmark: N Seoul Tower\n",
      "Last Landmark: Haeundae Beach\n"
     ]
    }
   ],
   "source": [
    "# Accessing specific elements using Python indexing\n",
    "print(\"First Landmark:\", result[0])\n",
    "print(\"Second Landmark:\", result[1])\n",
    "print(\"Last Landmark:\", result[-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using Streamed Outputs\n",
    "For larger outputs or real-time feedback, you can process the results using the ```stream``` method. This allows you to handle data piece by piece as it is generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Gyeongbokgung Palace']\n",
      "['N Seoul Tower']\n",
      "['Bukchon Hanok Village']\n",
      "['Seongsan Ilchulbong Peak']\n",
      "['Haeundae Beach']\n"
     ]
    }
   ],
   "source": [
    "# Iterate through the streamed output for a subject\n",
    "for output in chain.stream({\"subject\": \"famous landmarks in South Korea\"}):\n",
    "    print(output)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-opentutorial-QDzDRI-1-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
